This tutorial will show you how to access data from the web using some python scripts (don’t worry, you will only have to copy and paste) and get them into QGIS. We will then create some maps and hopefully they will be fun and useful!

So we will consider a few ways of getting data from the web. But for this we will need to be using some python.

QGIS provides a built-in console where you can type python commands and get the result. This console is a great way to learn scripting and also to do quick data processing. Open the Python Console by going to Plugins > Python Console:

This will open a window, where you can paste the Python code into. But first I will describe each bit of code, and talk about what it means and what you would have to change, if you wanted to make this relevant for you.

Writing your first bit of code

Just to demystify the process a bit, we will start with writing something super simple. We will create a variable the contains some text. In python you assign a value to a variable with the = sign.

So here we will create a variable called hello and give it the value “hello world :)” because we’re happy people. You can do this with the following code:

hello = "hello world :)"

Now you have assigned the value hello world :) to the variable called hello. Try pasting this into the Python Console:

Now this varaible, called hello is in your environment.

Try what happens if you just type it’s name (hello) into the console!

Congratulations you have just written your first bit of code! Woohoo!


Setting up the python console and your environment for downloading some data

Setting up a working directory

So firstly, before we begin to do any work, we should create a working directory. This will be a folder where you will save all our data, and also where you will be reading data in from. To do so, first create a folder and identify the path to this folder.

On a mac you can do this by dragging the file into Terminal:

On a PC, you can do this by copying the file path from the top bar in the window explorer thing circled in red below:

NOTE: from the PC you will have to change the direction of the dashes. So instead of backslash (\) you want a forwardslash (/).

You can then just copy this file path, and use it to create a text variable called my_path using this string. To do this, copy the code below, and paste it into a text editor, like notepad.

my_path = "paste your path here"

Then replace where it says paste your path here with your file path to your folder, which you copied. Make sure it’s all inside the quotation marks (“”) and that there is no space on either end.

Now copy this bit of code, and paste it into the python console, and press enter. Just like we created the hello object to hold your text saying “hello world :)”, we have now created this path object, to hold the path to your folder, where you will be saving everything. After you have done this, you will need to then copy and paste the below but of code. This just sets the path for saving etc to your path, using the path object you just created above.


import os
path = my_path
os.chdir(path)

What you are doing here, is making sure that this folder is where we will be saving all the data, and also if we tell QGIS to read in some files, it will look for them in this folder as well.

Importing some modules

Python is great because it has many modules that people have created, which enable you to easily do certain operations. Normally you will have to download these onto your computer, before running them. You can see here for some information about downloading modules for yourself on your PC. You will need to install something called pip, for which there is a quick tutorial here, and then you can use that to install packages. There is a quick video tutorial of how to install a package here.

But on the lab PCs you will have these already installed, so you just have to import them into this session in QGIS. To do so, you will have to copy and paste the below code:


from bs4 import BeautifulSoup
#The beautiful soup module is what we will be using for scraping data from webpages

import csv
#You will also need the csv module to save csv

import requests
#And you will need the requests to get data from an url

Under each import I make a brief comment about what we are using each module for.

Webscraping

Web scraping (web harvesting or web data extraction) is data scraping used for extracting data from websites. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler. It is a form of copying, in which specific data is gathered and copied from the web, typically into a central local database or spreadsheet, for later retrieval or analysis. Wikipedia.

In the lecture I described scraping data from Fix My Street. That one was a larger scale project that took a bit of a longer time, but I will walk you through a small-scale project here, to give you an idea (and the skills) for doing this yourself.

So let’s have a look at the data they have here, about the number of reports in each local authority. You can see this in a table here

So this is interesting, and we might want to use this to make some maps. Now you can easily get this data into R using the web scraping python module beautiful soup.

What you need for that to happen, is to copy and paste the below code into the Python Console:

[I will break this up and explain code more, but this is for testing purposes now]

url = "https://www.fixmystreet.com/reports"
response = requests.get(url)
code = response.content

soup = BeautifulSoup(code, "html.parser")
table = soup.find('table', attrs={"class" : "nicetable"})

headers = [header.text for header in table.find_all('th')]
list_of_rows = []
for row in table.findAll('tr'):
    list_of_cells = []
    for cell in row.findAll('td'):
        text = cell.text.replace(' ','')
        text = text.encode('utf-8').strip()
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

outfile = open("./output.csv", "w")
writer = csv.writer(outfile, lineterminator="\n")
writer.writerow(headers)
writer.writerows(list_of_rows)

#now read this into the QGIS window

uri = "file:///" + path + "/output.csv?type=csv&geomType=none"
layer = QgsVectorLayer(uri, "fms_table", "delimitedtext")
QgsMapLayerRegistry.instance().addMapLayer(layer)

Now a text file should appear in your QGIS Layers window.

[Section here about getting a geojson of UK to which I can join this and create a thematic map based on number of reports in each countil. Then some disucssion about what this map means]

Twitter

Signing up as a developer with Twitter

First, you will need to create a twitter account if you don’t already have one. You will have to visit twitter.com and sign up.

Then you go to apps.twitter.com/ and click on Create New App.

Fill out the details. You can name your app whatever you like. (under ‘website’ it lets you put a placeholder (since you don’t have a site for your app yet). Just make sure it’s an URL that doesn’t already exist!). Fill out the form and then click on ‘Create your twitter application’

Now you should have some credentials, which you will need for getting some data from Twitter. You can see your credentials by clicking on the Keys and Access Tokens tab, circled in red below:

Getting some twitter data into QGIS

[Some description about the process here]

Lucky for us, someone has again written a plugin for getting some tweets into QGIS, so I will not make you write more code right now.

Instead, we will use the twitter2qgis package.

NOTE: you will need to have the tweepy module installed for this package to run properly. You do this the same way that you would have installed the other modules yourself (eg BeutifulSoup and csv, see above!). When you install the plugin, I think it gives you a temporary install of tweepy - but this won’t always work! So safer to have the module installed properly!

OK but for now, let’s progress. Bring up the dialogue window by clicking on Web > twitter2qgis > collect tweets:

It will open up a dialogue window, asking for your twitter details, which we got earlier by registering.

Fill this out with your access token etc information, which you acquired in the above section!

Then also specify a keyword you want to search. Here I used “night tube” because I was wondering what people were tweeting about the (relatively) new 24hour tube service in London. Specify the number of tweets you want to get back (for now let’s choose 10, so we don’t have to wait too long to see some results) and say that you want the file of the raw tweets. When all that’s done, and your form looks like the one below, hit “OK”.

Now you must be patient. Only about 1-2 per cent of tweets are geocoded, and this plugin will only take the geocoded ones. So it will have to cycle through quite a few tweets, before it finds you 10 geocoded ones, with your required key words. Patience is always key with these things. You will get some sort of indicator that QGIS is not responding (because it’s working) so you can sit tight for a bit. Check your (possibly new) twitter account!

Now once it is done, you will see a new window, asking you what coordinate system you would like to map your tweets in:

Here we will select WGS 84

You should now see a new layer appear in your layers list. Right click and view attribute table to have a peak at the sorts of tweets we gathered. In this case, actually we see that we got a whole bunch of tweets talking about “night” but not related to the night tube! This is not exciting.

[Some stuff here about how to search for what you want and then some more mapping with the tweets once we have them back]